Search results

1 – 2 of 2
Article
Publication date: 3 November 2020

Jagroop Kaur and Jaswinder Singh

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of…

Abstract

Purpose

Normalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of problems that are not present in regular text. Recently, a considerable amount of work has been done in this direction, but mostly in the English language. People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script. This kind of text further aggravates the problem of normalizing. This paper aims to discuss the concept of normalization with respect to code-mixed social media text, and a model has been proposed to normalize such text.

Design/methodology/approach

The system is divided into two phases – candidate generation and most probable sentence selection. Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language. Character-based translation system has been proposed to generate candidate tokens. Once candidates are generated, the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.

Findings

Character error rate (CER) and bilingual evaluation understudy (BLEU) score are reported. The proposed system has been compared with Akhar software and RB\_R2G system, which are also capable of transliterating Roman text to Gurmukhi. The performance of the system outperforms Akhar software. The CER and BLEU scores are 0.268121 and 0.6807939, respectively, for ill-formed text.

Research limitations/implications

It was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing. Spell checker can improve the output of the system by correcting these minor errors. Extensive experimentation is needed for optimizing language identifier, which will further help in improving the output. The language model also seeks further exploration. Inclusion of wider context, particularly from social media text, is an important area that deserves further investigation.

Practical implications

The practical implications of this study are: (1) development of parallel dataset containing Roman and Gurmukhi text; (2) development of dataset annotated with language tag; (3) development of the normalizing system, which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi. It can be extended for any pair of scripts. (4) The proposed system can be used for better analysis of social media text. Theoretically, our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.

Originality/value

Existing research work focus on normalizing monolingual text. This study contributes towards the development of a normalization system for multilingual text.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Book part
Publication date: 24 November 2023

Sudhir Rana, Jagroop Singh and Sakshi Kathuria

The study responds to the common concerns of authors, reviewers, and editors on writing and publishing high-quality literature review (LR) studies. First, we evolved the…

Abstract

The study responds to the common concerns of authors, reviewers, and editors on writing and publishing high-quality literature review (LR) studies. First, we evolved the background and decision elements on the five parameters of a quality LR paper: Planning, Operationalizing, Writing, Embedding, and Reflecting (POWER), from the editorials and guiding literature. Statistical procedure and refinement of 256 responses from writers, reviewers, and editors revealed 37 decision elements. Finally, a multicriteria-decision-making approach was applied to the detailed responses from the lead editors of ABDC, Scopus, ABS, and WoS journals, and 31 decision elements were found strong enough to represent these five parameters on the quality of LR studies. All five parameters are found important to be considered. However, a high priority is suggested for embedding (the results coming out of the review) and operationalizing (the process of conducting the review), whereas reflection, writing, and planning of LR papers still remain important. The purpose of the POWER framework is to overcome the challenges and decision dilemmas faced by writers and decision-makers. The POWER framework acts as a guiding tool to conduct LR studies in general and business management scholars in specific ways. In addition, this study provides a checklist (Table 6) and template (Appendix A1) of a quality LR study to its stakeholders.

Details

Advancing Methodologies of Conducting Literature Review in Management Domain
Type: Book
ISBN: 978-1-80262-372-7

Keywords

1 – 2 of 2